Video Editing Based on Situation Awareness from Voice Information and Face Emotion

نویسندگان

Tetsuya Takiguchi

Jun Adachi

Yasuo Ariki

چکیده

Video camera systems are becoming popular in home environments, and they are often used in our daily lives to record family growth, small home parties, and so on. In home environments, the video contents, however, are greatly subjected to restrictions due to the fact that there is no production staff, such as a cameraman, editor, switcher, and so on, as with broadcasting or television stations. When we watch a broadcast or television video, the camera work helps us to not lose interest in or to understand its contents easily owing to the panning and zooming of the camera work. This means that the camera work is strongly associated with the events on video, and the most appropriate camera work is chosen according to the events. Through the camera work in combination with event recognition, more interesting and intelligible video content can be produced (Ariki et al., 2006). Audio has a key index in the digital videos that can provide useful information for video retrieval. In (Sundaram et al, 2000), audio features are used for video scene segmentation, in (Aizawa, 2005) (Amin et al, 2004), they are used for video retrieval, and in (Asano et al, 2006), multiple microphones are used for detection and separation of audio in meeting recordings. In (Rui et al, 2004), they describe an automation system to capture and broadcast lectures to online audience, where a two-channel microphone is used for locating talking audience members in a lecture room. Also, there are many approaches possible for the content production system, such as generating highlights, summaries, and so on (Ozeke et al, 2005) (Hua et al, 2004) (Adams et al, 2005) (Wu, 2004) for home video content. Also, there are some studies that focused on a facial direction and facial expression for a viewer’s behavior analysis. (Yamamoto, et al, 2006) proposed a system for automatically estimating the time intervals during which TV viewers have a positive interest in what they are watching based on temporal patterns in facial changes using the Hidden Markov Model. In this chapter, we are studying about home video editing based on audio and face emotion. In home environments, since it may be difficult for one person to record video continuously (especially for small home parties: just two persons), it will require the video content to be automatically recorded without a cameraman. However, it may result in a large volume of video content. Therefore, this will require digital camera work which uses virtual panning and zooming by clipping frames from hi-resolution images and controlling the frame size and position (Ariki et al, 2006). Source: Digital Video, Book edited by: Floriano De Rango, ISBN 978-953-7619-70-1, pp. 500, February 2010, INTECH, Croatia, downloaded from SCIYO.COM

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Etiologies of Dysphonia in Patients Referred to ENT Clinics Based on videolaryngoscopy

Introduction: Laryngeal dysfunction may be divided into three categories; organic, neurologic and functional disorders. Dysphonia and hoarseness are the most common symptoms and, in some cases, the only signs of laryngeal dysfunction. In differential diagnosis of any type of chronic hoarseness, a neoplastic process must be considered and, thus continuous light video laryngoscopy can provide imp...

متن کامل

Emotion perception by eye and ear and halves and wholes

How is the perception of emotion affected by the provision of multiple sources of information (both within and across modality)? We examined how the perception of emotion differed depending upon which face regions were visible and which modality (auditory or visual, AV) was used. Auditory and visual speech of five talkers expressing anger, disgust, fear, happy, sad, surprise or neutral emotion ...

متن کامل

When dynamic, the head and face alone can express pride.

Prior research suggested that pride is recognized only when a head and facial expression (e.g., tilted head with a slight smile) is combined with a postural expression (e.g., expanded body and arm gestures). However, these studies used static photographs. In the present research, participants labeled the emotion conveyed by four dynamic cues to pride, presented as video clips: head and face alo...

متن کامل

Audio-visual synchrony for detection of monologues in video archives

In this paper we present our approach to detect monologues in video shots. A monologue shot is defined as a shot containing a talking person in the video channel with the corresponding speech in the audio channel. Whilst motivated by the TREC 2002 Video Retrieval Track (VT02), the underlying approach of synchrony between audio and video signals are also applicable for voice and face-based biome...

متن کامل

FILTWAM and Voice Emotion Recognition

This paper introduces the voice emotion recognition part of our framework for improving learning through webcams and microphones (FILTWAM). This framework enables multimodal emotion recognition of learners during game-based learning. The main goal of this study is to validate the use of microphone data for a real-time and adequate interpretation of vocal expressions into emotional states were t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Video Editing Based on Situation Awareness from Voice Information and Face Emotion

نویسندگان

چکیده

منابع مشابه

Etiologies of Dysphonia in Patients Referred to ENT Clinics Based on videolaryngoscopy

Emotion perception by eye and ear and halves and wholes

When dynamic, the head and face alone can express pride.

Audio-visual synchrony for detection of monologues in video archives

FILTWAM and Voice Emotion Recognition

عنوان ژورنال:

اشتراک گذاری